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Abstract 

Costa’s “writing on dirty paper” result establishes that full state pre-cancellation can be attained in the Gel’fand-Pinsker problem 
with additive state and additive white Gaussian noise. This result holds under the assumptions that full channel knowledge is 
available at both the transmitter and the receiver. In this work we consider the scenario in which the state is multiplied by an 
ergodic fading process which is not known at the encoder. We study both the case in which the receiver has knowledge of the 
fading and the case in which it does not: for both models we derive inner and outer bounds to capacity and determine the distance 
between the two bounds when possible. For the channel without fading knowledge at either the transmitter or the receiver, the 
gap between inner and outer bounds is finite for a class of fading distributions which includes a number of canonical fading 
models. In the capacity approaching strategy for this class, the transmitter performs Costa’s pre-coding against the mean value 
of the fading times the state while the receiver treats the remaining signal as noise. For the case in which only the receiver has 
knowledge of the fading, we determine a finite gap between inner and outer bounds for two classes of discrete fading distribution. 

The first class of distributions is the one in which there exists a probability mass larger than one half while the second class is 
the one in which the fading is uniformly distributed over values that are exponentially spaced apart. Unfortunately, the capacity 
in the case of a continuous fading distribution remains very hard to characterize. 

Index Terms 

Gel’fand-Pinsker Problem; Writing on Fading Dirt; Ergodic Fading; Imperfect Channel Side Information; 

I. Introduction 

In the GeTfand-Pinsker (GP) model ID the output of a point-to-point memoryless channel is obtained as a function of 
the channel input, a noise term and a state variable which is non-causally provided to the transmitter but is unknown at the 
receiver. In this channel the state may represent the interference caused by another user in a wireless network which is also 
communicated to the transmitter by the network infrastructure. In the original setup, both transmitter and receiver are assumed 
to have perfect channel knowledge: while it is reasonable to assume that a transmitter knows the channel toward its intended 
receiver and vice-versa, it is not always realistic to suppose that a transmitter knows the channel between an interfering user 
and the receiver. This is especially true in wireless network, since here channel conditions vary continuously over time and 
reliable channel estimates are hard to obtain. 

The work of S. Rini was partially funded by the Ministry Of Science and Technology (MOST) under grant 103-2218-E-009-014-MY2. The work of S. 
Shamai was supported by the Israel Science Foundation (ISF) and by the European FP7 NEWCOM#. 



Fig. 1. The Dirty Paper Channel with Fast Fading Dirty (DPC-FFD). The dotted line represent the state information provided at the transmitter. 

The “writing on dirty paper” result from Costa (2) establishes a closed-form characterization of the capacity of the GP 
problem in the additive state and additive white Gaussian noise setting. Perhaps surprisingly, the presence of the state does 
not reduce the capacity of this model, regardless of the distribution or power of this sequence. In this work we are interested 
in characterizing the effect of fading on the capacity of this model and determine the optimal transmission strategies in this 
scenario. In the literature, different variations of Costa’s setup which also include fading have been considered. The “writing 
on fading dirt” channel in @ is a variation of the channel of O in which both the channel input and the state sequence 
are multiplied by a fading value known at the receiver but not at the transmitter. The authors of 0 evaluate the achievable 
region with Costa’s assignment and show that the rate loss from full state pre-cancellation is vanishing in both the ergodic 
and quasi-static fading case. In the “compound dirty-paper” channel of (4) only the state is multiplied by a quasi-static fading 
coefficient know at the receiver but unknown at the transmitter. For this model, an inner bound based on lattice strategies is 
derived to compensate for the channel uncertainty at the transmitter. Achievable rates under Gaussian signaling and lattice 
strategies for this channel are derived in O while outer and inner bounds to the capacity of the writing on fading dirt channel 
with phase fading are derived in [6) . The approximate capacity of this channel is obtained in Q for the case of binomial and 
uniform phase fading case. 

In this paper we study the “writing on fading dirt” model, a variation of the classic model in which the state sequence 
is multiplied by an ergodic fading coefficient which is not known at the transmitter. We derive inner and outer bounds to 
capacity for both the case in which the fading is known at the receiver and for the case in which it is not. When neither the 
transmitter nor receiver have fading knowledge, we show that the outer bound can be attained to within a finite gap for a 
class of fading distribution which includes the Gaussian, the uniform and the Rayleigh distribution but does not include the 
log-normal distribution. For the case in which only the receiver has fading knowledge, we show a finite gap between inner 
and outer bound for two classes of discrete distributions: when the fading distribution has a mass function greater than a half 
and when it is uniformly distributed over a set of points that are exponentially spaced apart. 

The remainder of the paper is organized as follows: Sec. HU introduces the channel model and the some related results. Sec. 
am investigates the capacity for the case in which neither the transmitter nor the receiver have fading knowledge while Sec. 
HVl focuses on the case in which only the receiver has fading knowledge. Finally, Sec. [V] concludes the paper. 










Fig. 2. The Dirty Paper Channel with Fast Fading Dirty and Receiver Channel Side Information (DPC-FFD-RCSI). 

II. Dirty Paper Channel with Fading Dirt 

In Dirty Paper Channel with Fast Fading Dirt (DPC-FFD), also depicted in Fig. [T] the channel output is obtained as 

Yi=Xi + cAiSi + Zi, ie [l...Af|, (1) 

for c G M and where X[ is the channel input, Si the state, A* the fading realization and Z[ the additive noise. The channel input 
Xi is subject to a second moment constraint E [\Xj i 2 ]< P while the state Si and the noise term Z[ are distributed as 

Si~Jf(ns> 1), Zi~JT{ 0,1), i.i.d. (2) 

where ^F(/i,cr 2 ) indicates the Gaussian Random Variable (RV) with mean fi and variance a 2 . The fading RV A* is drawn 
from a distribution p\ which has variance one and mean /i^. The state sequence S N is assumed to be non-causally available 
at the transmitter while fading sequence A N is unknown at both the transmitter or the receiver. 

A related model to the DPC-FFD in Fig. |T] is the model in which the fading sequence is provided to the receiver. We refer 
to this model as the Dirty Paper Channel with Fast Fading Dirty and Receiver Channel Side Information (DPC-FFD-RCSI), 
also depicted in Fig. [2] For the DPC-FFD-RCSI the receiver side information can be seen as an additional channel output, that 
is, the channel output is the vector [Yi Ai ] for Yi in (O. 

Remark II.l. Mean of the state and the fading. The channel output in dTJ can be rewritten as 

Y = X + c (Ao — 11a) (So — Ps) + Z 

= X J rC (AqSo — /L 4 S 0 — llsA 0 + liAlls) + Z, (3) 

where Aq= A — jaa and So = S — fis. that Each of the term in ([3]) can be seen as follows 

• c/IsAq can be cancelled at the receiver when it posses fading knowledge. Without receiver fading knowledge, this term is 
unknown at both the receiver and the transmitter and is equivalent to additive noise. 

• c/IaSo can be pre-cancelled with Costa coding by the transmitter as in na (Costa pre-coding in the following). 

• cAoSo requires the cooperation of both transmitter and receiver, since they each have a knowledge of one of the terms in the 
multiplication. 

The DPC-FFD and the DPC can be used to model the downlink scenario in which a base station is aware of the signal 
transmitted by a neighbouring base station but has only partial or no knowledge on the channel between the interference and 
the intended receiver. In this scenario it is not clear whether the knowledge of the interfering message is at all useful at the 











base station since the pre-coding operations heavily rely on the knowledge of the channel gains. 


A. Related Results 

Gelfand-Pinsker (GP) channel. The DPC-FFD and the DPC-FFD-RCSI are a special case of the GP problem for which 
capacity is obtained in ffl . 

Theorem II.2. Capacity of the DPC-FFD/-RCSI ffl. The capacity *€ of the DPC-FFD in ([5]) is 

= max I(Y\U)-I(U\S ), (4) 

p u,x\s 

while the capacity of the DPC-FFD-RCSI is obtained from @ by considering the channel output [Y A]. 

The expression in © contains an auxiliary RV U and entails the maximization over the distribution Pjj,x\s- For this reason 
a closed-form expression cannot be evaluated easily, either analytically or numerically. 

Dirty paper channel with receiver side information and phase fading. In 0, we have derived the approximate capacity 
of the DPC-FFD-RCSI for the case in which p a is a circularly binomial distribution. 

Theorem II.3. Capacity of the DPC-FFD-RCSI with circularly binomial fading [7, Th. IV.5]. 

Consider the DPC-FFD-RCSI 


Y t = Xi + e j d ‘S RJ + Zt, i e [1... N], (5) 

where the state Srj is a Gaussian RV with zero mean and variance Q and while the fading is A = exp{0} for 

p e(t) = ^ (l{*=+A}W + 1{*=-A}W) ^ A G [0,;r/2], (6) 

then, if n /4 < A < 7t/2, the capacity lies to within constant gap of 3 bits per channel use from the outer bound 

log(P+ 1) +2 c 2 < 1 

^OUT = I |log( J P+l) + 2 C>P+ 1 

ilog(P+l) + ilog(l + (v / P + c) 2 ) 

-ilog(2c 2 ) + 2 l<c 2 <P+l 

where c = sin(A )y/Q. 

Carbon copying onto dirty paper. A model related to the DPC-FFD is the “carbon copying onto dirty paper” of ffl: in this 
channel model there are M possible state sequences Sj that can possibly affect in the channel output. The transmitter has 
knowledge of each sequence but does not know which one will appear. Correct decoding must be granted regardless of the 
state realization and for each of the possible channel output. 


Yf =X N + cS N j +z I J, je[i...M], 


( 7 ) 






where SJ is an i.i.d. Gaussian sequence for each j E [1...M]. In O inner and outer bound to the capacity region are derived 
but capacity has yet to been determined. 


III. The dirty paper channel with fast fading dirt 


We begin by investigating the capacity of DPC-FFD in Fig \T\ since no closed-form expression for the optimization in 0 
is available, we derive a novel outer bound that is expressed solely as a function of the channel parameters. This outer bound 


can be approached, for some models, by a simple achievable strategy in which the transmitter to performs Costa pre-coding 
against the term cPaS, the average realization of the fading times state. 

For the DPC-FFD the term cpsA acts as additional noise, since it is unknown at both the transmitter and the receiver: for 
this reason in the following we assume that Ps = 0. 

Theorem III.l. Outer bound and partial approximate capacity for DPC-FFD. 

Consider the DPC-FFD in Fig. \I\and let h(A ) = ^\og(27lea) for some a E [0,1], then the capacity Z? is upper bounded as 



( 8 ) 


and the capacity is to within a gap G bits/channel-use from R olJT where 



(9) 


Proof: The proof can be found in App. |A] 


The gap from capacity in Th. IIII. II can be easily evaluated for some canonical fading distributions. 


Lemma III.2. Gap from for some fading distributions. 

• When A is Gaussian distributed with mean Pa and unitary variance, the capacity is known to within a gap Gjr 



• When A is uniformly distributed between [pa — f ,/i-A + f ], the capacity can be attained to within a gap G^ 



• When A is Rayleigh distributed, i.e. A = \JU 2 + V 2 for U,V ~ ^F(0,2/(4 — k)) and independent, capacity can be attained 
to within a gap Gr defined as 



where y is the Euler-Mascheroni constant. 

• When A is log-normal distributed, i.e. A = e ZA e~ 2 ^~° 2 {e° 2 — l) -1 for Za ~ (/i,cr 2 ), capacity can be attained to within a 
gap Gi 0 g defined as 








which is not a finite value for all values of /I and a 2 . 

The result in Th. IIII. II is substantially a negative result since in establishes that, for a number of fading distributions for 
which a is close to one, the best strategy is to Costa pre-code against the mean value of the fading times the state and treat 
the term AqSo as additional noise. This strategy performs very poorly when compared to the full state pre-cancellation and 
indeed, for any choice of the power P, capacity tends to a small constant as the term c 2 increases. 

Note that the gap G in © for the log-normal distribution is not bounded: the variance of this distribution grows exponentially 
with a 2 while the entropy grows logarithmically with or 2 , therefore a can be made arbitrarily small and G arbitrarily large. 

In actuality, we expect the outer bound in ([8]) to be close to capacity for a larger set of distributions than that for which a 
is close to one. The difficulty in developing a more general result lies in the lack of tighter outer bound. 

Note also that this result does not hold for discrete fading distributions and thus does not include extensions of the result in 
Th. III.3I for the case with no RCSI. 

IV. Dirty Paper Channel with Fast Fading Dirt and Receiver Side information 

We now turn our attention to the DCP-FFD-RCSI: also for this channel capacity can be obtained from Th. III.21 but the 
optimization is extremely hard to express in closed-form. This case is significantly harder to study than the case with no 
receiver fading information because of the distributed way in which transmitter and receiver can cooperate in dealing with the 
term cAS. As an illustrative example, consider the DPC-FFD with no additive noise and in which the state and the input are 
restricted to take value ±1, that is 


Y=X+AS , X,S G { — 1,1}, (10) 

while A has any distribution. Given the cardinality of the input, the capacity of this channel is at most 1 bit/channel-use. This 
rate can be attained by setting X(— l)=X(+l) = l/2, independent from S and by setting U =XS and independent from S in 
®. With this assignment, U can be recovered from the channel output by considering the squared channel output, in fact: 

(Y 2 \A = a) = X 2 + a 2 S 2 + 2aXS = \+a 2 + 2aU , (11) 

so that U = (Y 2 — 1 —A 2 )/2A, regardless of the distribution of A. This simple example shows that the maximization in 0 
might yields some unexpected results. 

Given the difficulty of the problem at hand, we are able to make only partial progress in characterizing the capacity of the 
DPC-FFD-RCSI. In the following we provide two approximate capacity results for two classes of discrete distributions of A: (i) 
for the class of discrete distributions in which one of the probability masses is larger or equal to one half and (ii) for the class 
of uniform distributions over the discrete set in which points are incrementally spaced apart. Both results are a generalization 
of our previous result in Th. III.31 and employ a similar inner bound in which the transmitter simply performs Costa pre-coding 
against one realization of the fading times the state. Our contributions is, therefore, to identify a set of channels in which Costa 
pre-coding is optimal, although it is clear that this coding strategy is not be capacity achieving in general. 




Note that, for the DPC-FFD-RCSI, we again consider the case in which ps is equal to zero: since the receiver has knowledge 
of A, it can subtract cpsA from the channel output. We also let p A = 0 for simplicity: the general case is considered in the 
journal version of this work. 

Let’s consider first the class of distribution in which there exists an outcome A = a' with Pa{o!) > 1/2: this class of 
distributions generalizes the distribution considered in our result in Th. III.31 For this fading model the transmitter can Costa 
pre-code against the realization ca'S and obtain full state cancellation for approximatively a portion Pa{o') of the time. The 
performance of this strategy can be improved upon letting the channel input be composed of two codewords: one treating 
the state times fading as noise and one that Costa pre-codes against ca'S. By optimizing over the power allocated to each 
codeword, one obtains a larger inner bound. 

Theorem IV. 1. Approximate capacity for a discrete distribution with a mass larger than half. 

Consider a DPC-FFD-RCSI in Fig. \ 2\and let A have a discrete distribution P A (a!) with support srf where there exists A = a' 
such that Pa{o!) >1/2. Define moreover 


P' A = P A (d), p' A = \ — P A (a) 
G = p' A E[\og(a — a') 2 \ a a] 
log 


G' = P A E 


{a —a') 2 „ , . . , 

—~2-hi ) | a^a 


then the capacity is upper bounded as 


<€ < R OUJ = 


\ log(l +P) + 1 
4 log (l+P) 


Pa < P'a< 


-log (Pc 2 ) + 1 — G/2 


^log(l+P) + |-G/2 


P' A c 2 <P' A {P+ 1) 
P' A c 2 >P A (P+ 1) 


an the capacity lies to within G' — G + 3 bits per channel use from R OUT . 


Proof: The proof can be found in App. [B] 

The result of Th. IIV.1I can be evaluated for some discrete fading distributions. 


Lemma IV.2. Gap from for some discrete distributions. 

• When A is distributed according to a geometric distribution , i.e. 


P A {k a + nA) = (\-p) n p, ne N, (12) 

for some p G [0,1], A > 0 and p 2 A 2 = p (to obtain a unitary variance) ani k a = — A(1 — p)/p (to obtain zero mean), Th. \IV. l\ 
can be applied for p <1/2. For this choice of p, A = k a has probability larger than a half and the best strategy for the 
transmitter is to Costa pre-code against the sequence ck a S or otherwise treat the fading times state as noise. The value of the 









outer bound in G3 depends on the value G, while the gap from capacity on G' which are obtained as 


G = 2 £ log (nA) p(l - p) n > -(1 - p) log A 2 

ft=l 

G '= log +1) mi -/>)-< ^<i - rt. (U) 

/<?r which 

G'-G<(l-p)(k- 2 + logA 2 ). (14) 


The gap between inner and outer bound goes to infinite as A goes to zero: in this regime the channel reduces to the classic 
DPC with no fading for which the bounding techniques in Th. \IV.1\ are no longer tight. Note that (TTH) goes to infinity as k a 
goes to zero, but this is only a consequence of the bounding in ©. 

• Binomial Distribution. Consider now the case in which A has a binomial distribution of the form 


p A (k a +nA,N) 




" / 


(1 -pTp 


n n 2N—n 


n G [-N ... + A/], 


and 2Np(l — p) = A 2 to maintain the variance unitary and k a = —NAp to have zero mean. By simple enumeration we see that 
for N > 1 no assignment of p gives a probability mass larger than a half. For N = 1 we have only one p which makes the 
theorem applicable: p =1/2 which corresponds to the probability vector [1/4/1/2/1/4]. This result extends the case where 
the probability vector is [1/2/1/2] which corresponds to the case it Th. \II.2\ 


Another possible extension of the result in Th. III.31 is the case in which A is uniformly distributed over a set with more 
than two elements. In the following we indeed show such a generalization: the caveat is that the points in the support of 
the distribution must be increasingly spaced apart points. This result is similar in spirit to our result in @ for the DPC with 
slow fading, that is, for the channel in which a fading coefficient is randomly drawn from a set of possible values before 
transmission and is kept constant through the channel transmission. The intuitive interpretation of this result is as follows: 
when two fading value are sufficiently spaced apart, the transmitter cannot exploit the correlation between the two different 
channel outputs corresponding to the two different fading realizations. For this reason the best choice for the transmitter is to 
Costa pre-code against one realization of the fading times state. 


Theorem IV.3. Approximate capacity in the “strong fading” regime. 

Consider the case in which A is uniformly distributed over the set 


g/(M) = {ao,a\.. .aM, fliEl}, (15) 

with Var(A) = 1 and let A j be the distance between two consecutive points in srf, that is 


Aj+i =a i+ 1 -au i £ [0...M- 1] 


(16) 







and Ai > a, then, if 


A ^ +1 > (ccc 2 — 1) A ? +1 + 2, i > 2 

7=1 


(17) 


for some a > 0 then an outer hound to capacity is 

2 1 °s( 1 + ?Tt) + : 
23?l°g(l + ^) + 


7?’ 


OUT . 


M— 1 / cl 
M — M 

M — M \ r ^ i 7 


+ W lo £ (c 2 ) + l+loga /2 


^log(l+P) + l+loga /2 £> W>+ 1 ) 
aftd 7/ze capacity lies to within a gap 6>/max{log(a)/2 — G +3,1} where 

' {a — a') 2 


G=(M- 1)E 


log 


+ 1 1 \a 7 ^ 


(18) 


Proof: The proof is provided in App. O ■ 

As an example of Th. II V. 3 1 consider the case in which a = c 2 /(c 2 + 1): in this case the condition in (1T71) translates to the 
set s/ (M) defined as 

1 — c 


s/(M) = {0,Ai,cAi,c 2 Ai...c m “ 2 Ai}-A 0 - 


(19) 


where Ao is determined so that the variance is equal to one, that is 

A 2i_ c 2M—2 / Al 1-C M “ 1 \ 2 _ 1 

M 1 — c 2 \M 1 — c ) ~ 


( 20 ) 


which follows from the properties of the geometric series. 

Note that Th. II V.3 1 implies that, when c 2 is much larger than P, then the capacity of the DPC-FFD-RCSI as 1/M times the 
capacity of the channel without state. 

We conclude by providing an outer bound for the case of a continuous fading distribution. Unfortunately this bound is not 
tight in general: this reflect the fact that the outer bounding techniques employed so far are too crude to address this general 
case. 


Theorem IV.4. Outer Bound for continuous fading distributions. 

Consider the case in which A has a continuous distribution with such that there exists a an interval I = [a,b\ C M with 
Pa (/) > 1 / 2, let moreover 


a' e [a, b] s.t. P(a f ) (b — a) = P(I) 

G= [ log ((a - a 1 ) 2 ) dP a , 

JWL\I 


(21) 












then the capacity is upper bounded as 


< R om = 7 


ilog(l +P) + l 
F f\og{\+P)+ 

^ log (Pc 2 ) + 1 - G/2 
f log(l+P) + l-G/2 


^(7 ) < Pa(/)c 2 
Pa(I)c 2 <P a {1)(P+ 1) 

P A (/)c 2 >P A (7)(P+1) 


It is straightforward to verify that the above bound cannot be attained by simply performing Costa pre-coding against a 
value of ca'S for some a' of choice: in fact this strategy achieves 


1 


1 


tf* = -log(l+P)--E A 
«^log(l+P)-±E A 


log 


Pc z 


-{a — d^f 2 -f 1 


P + c 2 a 2 + 1 


log ( min{a 2 c 2 ,P} 


(i a — a') 




+ 1 


which goes to zero as P or c 2 grows, unless A is mostly concentrated around a' ± 1/c 2 


V. Conclusion 

In this paper we studied a variation of the classic dirty paper channel in which the channel state is multiplied by a fast fading 
process which is unknown at the transmitter. We consider both the case in which the decoder has knowledge of the fading and 
the case in which it does not. For this model we derive inner and outer bounds to capacity and bound the difference between 
the two when possible. When fading knowledge in not available at the receiver, the gap between inner and outer bounds is 
small for a number of classic fading distributions but it is not bounded for others. When fading knowledge is available at the 
receiver we can characterize capacity for some specific discrete distributions of the fading. 
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A. Proof of Th. \nU\ 

• Capacity outer bound 


Appendix 


Consider the following series of inequalities developed from Fano’s inequality 


N(R-e n ) 

(22a) 

<I(Y n \W) 

(22b) 

<I(Y n \W\S n ) 

(22c) 

= h(Y N \S N ) - h(Y N \W,S N ,X N ) 

(22d) 

= Nmaxh(Yj\Sj) - h(Y N \W,S N ,X N ) 
j 

(22e) 

= N maxE 5; [h(Xj + csAj+ ZA] - h(Y N \W,S N ,X N ) 
j 

(22f) 

< ^E s [log2^(/ J + c 2 s 2 + l)] -h(Y N \W,S N ,X N ) 

(22g) 

< ^log27le (P + c 2 + l) - h{Y N \W,S N ,X N ), 

(22h) 


where ( |22g| ) follows from the GME property given that A XX and Var[A] = 1 by definition while follows from Jensen’s 
inequality and from the fact that E [S\ = 0. Note that the mean of A does not influence this bound. 

For the term —h(Y N \W,S N ,X N ) we have: 


-h{Y N \W,S N ,X N ) 

(23a) 

= -h(cS N A N + Z N \W,S N ,X N ) 

(23b) 

= —h{cS N A n + Z N \S N ) 

(23c) 

= —Nh (cSjAj + Zj Sj ) 

(23d) 

^-NHiSjAfS^-moglcl 

(23 e) 


where (123cb follows from the Markov Chain cS N A N + Z N — S N — W,X N , (123cb from the fact that Aj,5* and Z z are iid RV. 
The term —h(Sj,Aj\Sj) can be rewritten as 


-h(SjAj\Sj) 




1 

2 


log(^) 


- h(Aj ) + 


where y is the Euler’s constant y ~ 0.577. Note that the derivation holds for A both continuous or discrete. 
Combining the bounds in (l22b and (l23t we obtain the expression in ([5]). 


• Capacity inner bound 


For the inner bound, we consider Costa’s dirty paper coding strategy to pre-cancel (IaS while disregarding the remaining 






randomness in the fading. This strategy attains 


f? IN = I(Y;U\A) —I(U',S) 
= H(U\S) —H(U\Y). 

Considering now the assignment in which X and U 


X~jK(p ,/>), 
U=X + kS , 


which attains 


fiiNa 7 io8 i ri 


1 


P+« 2 (1+ J u2) + 1 . 

by upper bounding h(U\Y) using the GME property. The optimal choice of k is 

P 


k* = 


P+l+c 


-cfi A 


which achieves 


as expected. 





(24) 


(25) 


(26) 


• Gap between inner and outer bound 

By comparing the outer bound expression in ©and the inner bound expression in (l26l) have that the difference in the two 
expressions is 


G = R om -R 1N 


1 y 

= - log 27te(P + 1 + c 2 ) — - log(27rec 2 a) + - 

— (^log2ne(P + 1 +c 2 ) — ^log27te(c 2 -\- 1) 


= b og 
108 

1 


c 2 +1 
ac 2 


1 

2 

4 \+\ 


3 ac 2 J 


< o l0 § - + o> 


a 


1 


where (I27dl> follows from the fact that capacity is known to within 1 bit for c < 3. 
Equation (l27t concludes the proof. 


(27a) 

(27b) 

(27c) 

(27d) 

(27e) 








B. Proof of Th.WVi 


• Capacity outer bound 

Using Fano’s inequality we write 


N(R-e) </(y^;W|A^) 
N 


* 2 Ea 


log2 ne(P + a 2 c 2 + 2|c| \a\V?+ 1) - H(Y N \W,A N ) + 
< |e a [log 2ne(P + a 2 c 2 + 1)] - H(Y N \W,A N )+ \ 
<^E A [log27re( J P + c 2 +l)]- £ ff(y w |W,A* = a*) +1 


(28a) 

(28b) 

(28c) 

(28d) 


where (128db follows from Jensen’s inequality. Next we derive a bound on H(Y N \W,A N ) based on the letter-typicality of the 
sequence a n , defined as 


-N{k\a N )-P A {k) 


< £PA{k) V k G 


(29) 


where N(k\a N ) is the number of symbols is the sequence a N which are equal to k, i.e. 

N(k\a N ) = £ l {k=aj} . 

7=1 

Accordingly, the ^-typical set ^ {Pa) is defined as the set of a N which satisfy 

1 


(30) 


■% N (Pa) = {<> 

Using the letter-typicality in ([29k we write: 


N{k\cr) —P A {k) 


< £PA{k ), V k G srf 


} 


(31) 


- £ P(^)H(F^|W,A^ = ^) (32a) 

<- £ F(a iV )if(F A '|W,A A ' = a A '). (32b) 

a N e^{P A ) 


Let now £ < ^ 1 so that A/a'I^) >1/2. With this provision, we can define the sequence a' N as a permutation of the 
sequence a N where 


• if di 7 ^ a', then at = a', 

• if ai 7 ^ a ', then a* = a'. 

This permutation is also depicted in Fig. [3] the sequence a ,N is obtained by permuting the positions i for which at ^ a' with 
some of the positions j for which aj = a since N{a'\x N ) > 1/2, this can always be done. Note that N — 2{N — N{a , \d n )) = 
2N{a'\a n ) —N positions are such that di = ~di = a'. 

With this definition of a' N we next define the equivalent channel output 


Y =X N + ca ,N S N + Z N , 


( 33 ) 









r 


Portion of Cl 

= a 


N 


< 


> 


Portion of Ct 
7 ^ a 


N 


< 


sequence 

a N 


sequence 

a N 


Permutation 


>- 


-N 

Portion of Cl 


7 ^ a 


< 


> 


Portion of Cl 
/ 

= a 


:N 




Fig. 3. The permutation that generates a' N from a N in the proof of Th. I IV. 1 1 in Ann. IbI 

where Z N has the same marginal distribution of Z N and any chosen joint distribution with this term. 
With these definitions in place, we write: 


< - 


£ P(a N )H(Y N \W,A N = a N ) 

eSr e N (p A ) 

(34a) 

2 £ P(a N ) (h(Y n \W,A n = a N )+H(Y N \W,A N = a ,N )\ 

2 a"e^(P A ) V 

(34b) 

- £ P{a N )(H{X N + ca N S N + Z N ,X N + ca’ N S N + Z N \W)) 

2 a N e? e N (P A ) V 7 

(34c) 

2 £ P(a N )H(c{a N -a ,N )S N + Z N -Z N ,X N +ca fN S N + Z N \w) 

2 a»e^ N (P A ) 

(34d) 

■i £ / , (a ;v ) + —+ #(F y — F,W',S Ar ,.X' Af )') 

2 a»e? E N (P A ) 

(34e) 

2 a»e? e N (P A ) 

(34f) 

\ £ P(a N ) (h (c(a N — a ,N )S N + Z n — Z N ^\ + ylog(2jre)j 

Z a"e^(P A ) V 7 

(34g) 


where (134el) follows from the fact that S N and the additive noises are independent from W. 

Let us now focus solely on the term 1/2 Y, a N e^(p A ) P( aN )H (c{a N — a' N )S N + Z N — Z N ^j : we can make use of the following 
properties of the typical sets: 

1 


- a N e sr N 

, at 


P(a N ) < n 
v ’ — 2 «( i +s)h(a) J 

\^{Pa)\ <(l-5 e )2” (1 “ e)H(A) 
N{k\a N ) < NP A (k)(a)(\ — e), 


(35a) 

(35b) 

(35c) 















for 


8 e = 2\j^\ e - n2min k p ^ k \ 


(36) 


Using the properties in (l35t we now write: 


\ £ P{a N )H(c{a N -a ,N )S N + Z N -Z N ) 


'a N ES e N (P A ) 

1 1 

< — 


2 2-»(l +£)H(A) 

1 1 

- 2 2-n(l+e)ff(A) 


£ H (c(a N — a N )S N + Z N — Z^) 


i N eSr e N (P A 


N 


< - 


£ £(//(c(a i -a,)5 ! +Z i -Z i )). 

a N E^ e N (P A ) i=l 


(37a) 

(37b) 

(37c) 


We would now wish to change the summation in the right hand side of (I37cl) from i G [1.. .N] to k G To do so we need to 
remember how a ,N was defined: at — Hi can take values: o' — a, a —a' and 0. Since the entropy term H (c(ai — at)Si + Z ; — Z/) is 
not affected by the sign of \ai — ai\, we conclude that there are 2(N — N(a' \a N )) times in which we have H (c(a' — k)Si + Zi~Zi) 
for some k ^ d and 2N(a'\a N ) —N terms with value H (Zi — Zi) . Additionally, for a given k , H ( c{a! — k)Si~\~Zi — Zi) appears 
N(k\a N ) times. 

With these observations we now write 


1 

2 


1 


2 ~n(l+e)H(A) 


1 


1 


2 2 -/i(t +e)H{A) 

1 


£ £ (H (c(ai - at)Si + Zi - 

a N E^ e N (P A )‘= 1 

£ £ 2N(k\a N )H (c(c 


a N E^ e N {P A )kEi/\a' 


--( 2N{a'\a N )-N)H(Zi-Zi ) 


-Zi)) 

'-k)Si+Zi-Zi) 


(38a) 


(38b) 


We can now choose the joint distribution between Z, ani Z, to simplify the bound above: for simplicity we choose Z N = Z N . 
With this choice, we can write 


~2 2-"(i T E 2^|^)//(c(a'-^ + Z i -Z,) 

zz a NE2r e N (P A )kE^\a' 

- l -{2N(a'\a N )-N)H(Z i -Z i ) 

= - L-„(i+ e)// ( A ) E E 2yV(^)//(c(a'-A:)5 ; )-^log(4^) 

ZZ a»E3r e N (P A )kerf\a' * 

= +e)g(A) E E 2A^(A:|a iV )llog(27rec 2 (a / -A:) 2 )-^log(4^e) 

ZZ a Ne^ e N (P A )ke^\a l z * 

= - 9 _„ (lH I £WA) (l-^)2 n(1 ~ eWA) E ^l^)bog(2^c 2 (a'-A:) 2 ) 

Z k<Esrf\a! 

N 

~ j log(4^e) 

= ~ 2 - n( i| eWA) (1 ~ 5 e )2 n ( 1 - £ ) H W(l - e)N £ P A (*)( fl )±log(2>recV-*) 2 ) 

k<Esrf\a! 

N 

- -log(47te). 


(39a) 

(39b) 

(39c) 

(39d) 

(39e) 











When N is sufficiently large and £ sufficiently small, we then have that 


-H(Y n \W,A n ) (40a) 

<- £ Pa (k) (a) \og{2Kec 2 {a - k) 2 ) - j log(47Te) - £ a ii (40b) 

k^sz/\a' 

< - log C 2 - y - ^ log(47Te) - Call , (40c) 


for some e a ii that goes to zero as N —)> oo. 

Using the bound in (l40l) in (I28db and for some £ a n sufficiently small, we obtain 

R° m = 1. log ( 2 Ke (p + c 2 + 1)) - y logc 2 - ^ i log(2^e) +1 (41a) 

< flog(P + c 2 +l)-^log(c 2 )-^ + l, (41b) 

We next optimize the above expression over the parameter c 2 over the set [0, c 2 ] since capacity must be decreasing in c. The 
optimal value of c 2 in (I4TT) is 

(I + *4 (42) 

When P A c 1 > P A (l + P) this optimization yield the tighter outer bound than the original outer bound in (1411 ) 



R om 


\P' A <?>P , A (l+P) 


= ^l0g(l+P) + ^2(Pl)-|+l 


where /^(x) indicates the binary entropy, so that the overall outer bound can be further simplified as 


R om = 


\ log (P + c 2 + 1) 

-^log(c 2 )-f + 1 P' A c 2 <P' A {P + 1) 


1A 


log(l+P)-| + 2 P'c 2 >^(^+1)- 


• Capacity inner bound 


(43) 

(44) 

(45) 


(46) 


For the inner bound consider the simple scenario in which the transmitter Costa pre-codes against the realization ca'S, which 
occurs more than half of the time. That is, consider the assignment 


,P) 

U = X -—— a'cS, U LX. 

P+ 1 





The attainable rate of this scheme is 


R 


the latter term is bounded as 


IN . 


: E A [[/(F;C/|A)-/(t/;5)] + ] 

>Si og(1+P)+ £ 

Z g/,a^a' z 


(l + cV + P)(l+f) 

P' A c 2 (a — a') 2 + P + c 2 a 2 + 1 ) ’ 


E ^iog 


g/,a^a' 


= E ^iogd+p)- 


(l+cV+P)(l+P) 

Pc 2 {a — a') 2 + P + c 2 a 2 + 1 

log 


P A (a), ( Pa 2 c 2 


gtf,a^a! 


(i a — a') 


A 2 


P + C 2 tf 2 + 1 a 2 


> ^ 1 0£ (J») PA ( a ) log ( min{P,a 2 c 2 } (a -a') 2 


\a^a! 

> I 

gf,a^a! 

> L 

g/ ,a^a' 

> L 

gf,a^a! 


Pa{ci) ( . f a 2 c 2 \ (a-a') 2 1 \ 

Pa{^) , ( . f. ^ 2 c 2 1 (a —a') 2 
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2 

P A (a) 


log 
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(a — a') 


M2 


+ 11= -G'. 


(47a) 

(47b) 

(48a) 

(48b) 

(48c) 

(48d) 

(48e) 

(48f) 


This attainable rate can be improved upon by using two codewords: one that treats the interference as noise. We can assign 
power a to one codeword and power a = 1 — a to the other and successively optimize over the power assigned to each 
codeword. This yield the achievable rate 


P lN = max E a 1 log ( 1 + - f f — 

ae[o,i] |_ 2 V 1 +c 2 a 2 + aP 


K lr _,. , y ft(«) ( (l+cV+OT)(l+iH>) 

T log(1 + aP)+ A^ — g W-'f+if+A’t: 


g/,a^a‘ 

r 1 ( aP 

> max - log 1 + -yy——- 

ae[0,l] 2 y 1 -\- c Cl -f CCP 

1 / aP 

> max - log I 1 


■ylog(l +aP) 
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1 +c 2 + aP 


P' Ct 

ylog(l + aP)-—, 


where (I49bb follows from the fact that the bound in (l48l) holds for any P. 
the optimal value of aP is then 


a*P = max 


min 


Ek c 2 

p’a 




so that, when p' A < P A c 2 <P A (P+l) we have 


R m = i log(P + c 2 + 1) - ^ log (c 2 ) - h 2 (Pa) 
> 1 log (P + c 2 + 1) - Y log (c 2 ) - 1 - y. 


(49a) 

(49b) 

(49c) 


(50) 


(51a) 

(51b) 





























Finally, we have shown the achievability of the outer bound 


± io g(i 


r in = < 


T+?) 


±log(P + c 2 +l) 

-|log(c 2 )-l-^ 

4log(l+P)-l-^ 


Pa < Pa c2 

P' A <P' A c 2 <P' A {P+ 1) 


P' A c 2 > P' A (P + 1) 


( 52 ) 


• Gap between inner and outer bound 

A gap between inner and outer bound of 3 bits in the interval P A c z > P A can be obtained by comparing the two expressions 
in (|53 and (E© in the cases i) P A < P' A c 2 , ii) P A < P' A c 2 <P A (P + 1) and iii) P' A c 2 >P A (P+ 1). 

For the case in which p' A < P' A c 2 we have that c 2 < 1 so that the capacity can be approached to within 1 bit by treating the 
interference as noise with a variance partially known at the receiver. 

In the other two cases the gap is at most — j + 3. 


C. Proof of Th.UV3\ 


• Capacity outer bound 

We proceed in the bounding from Fano’s inequality up to (l28t in App. |B] 

We next wish to construct now a sequence o! N from o N as done it the proof of Th. HV.ll for this proof we actually need to 
construct M — 1 auxiliary sequences, a^ k y obtained as 

4) = { a i = a i => a (k),i = «mod (k+jtf), Vj G [1 ...]£/} ke [0...M — 1], (53) 

Accordingly we define as the channel output obtained when the fading sequence is as in (l33l) , 

+ + (54) 

Note that, as in the proof of Th. IIV.ll we can associate a different noise to each in (l54l) and later choose the joint distribution 
among these noise terms. Since the symbols are equiprobable, we have that P(Y N \W,A N = a^) = P(Y N \W,A N = a^) for all 
k. Additionally, given the definition of typicality in (l29k if a N G 2Y^, we have that also G 3Y®. As a last definition, let 
YfaU) be the subset of position of Y^(j) in which a^j = OCj for j G [1...M] in the chosen ordering of srf , that is 

Y^U) = {Y {k ),iU), s.t ,a {kU = (Xj, ie[l...N]}, V/ G [1 ...M], (55) 

Accordingly Y^(m) is the subsets of channel outputs in which aj =x m and Y^(mod(m + k)) are the same subsets of outputs 
but in which aj = v mod(m+ ^. 

A first part of the proof involves extending the bounding in ([34b to the case of any number of passible fading realization 
M— \&Y\. This derivation involves a recursion which we illustrate this using the case M = 3: the general case is inferred from 
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Fig. 4. An illustration of the The sequences and the subsequences Y^(j) for kje {1,2,3} in App.lcl 

this derivation. We shall continue the derivation of the outer bound from (128dl) and focusing on the bounding of the term 

-#( 7 ^ 17 ^). 

• Case for M = 3 

Consider Mm 3 and si — {a\,a 2 ,< 23 } for some ordering of the elements in si and, as in (l38l ) note that 


H(Y n \W,A n ) < — £ P(a N )H(Y N \W,a N ) 

(56a) 

a N eSf(P A ) 


= ~ L Y H (Y N \W, a N ) + eaU 

a N e^(P A ) J 

(56b) 


The sequences 7^ and the subsequences 7^ ( 7 ) for kje {1,2,3} are illustrated it Fig. 0 from which we see that 7( 0 )(1), 


Fm (2) and 7(2) (3) are obtained from the same set of Xs, Ss and Zs but different fading value. 

For this reason we can write 

— H(Y n \A n = a N ) (57a) 

= -1 (h(Y n \W,A n = af 0) ) +H(Y n \W,A n = af 1} ) +H(Y N \W,A N = «f 2) )) (57b) 

-“5( H(y (0)’ 7 «’ F P) |W) ) (5?C) 

= ( H ( y (o)(1)» F (0)(2)» y (o)(3),(1), Yfo (2), Yfa (3), F ( ^ } (1), Y^ (2), Fp } (3) | W)) , (57d) 

where (I57db follows form the fact that the transformation of variables has Jacobian one. 

Using the definition of 7^(y) in (l55l ) we conclude that the vector 

F (1) (2)-F"(l), F ( "(2)-F (2) (l), F (2) (2)-F (1) (l) 


(58) 





























is a permutation of the vector 


where Z^, is a permutation of the terms 


c(a2 — CL\)S N + Z 21 i 


[Z (0) (2) — Z (2) (l), Z ( 1 ) (2) —Z ( 0 ) (l), Z ( 2 ) (2) — Z ( 1 ) (l)]. 


We then have 


-3//(7 A, |W,A A ' = a iV ) 

< « 0$) (2), Y» 0) (3), Zp) (2), Fp } (3), Kg (2), T ( " (3) |W, (c(n 2 - )S") + Z* \W) 

-H(c(a 2 -a l )S N + Z^\W), 


where (I61bb follows from the fact that this transformation has unitary Jacobian. Consider now the vector 

V^(3)-y ( 1 ) (2), Y W (3)-Y^(2), y ( 0 ) (3)-y ( 2 ) (2)‘, 
which is again a permutation of the vector 


c(a 3 -a 2 )S N + Z$ 2 , 


where Z 22 is a permutation of the noise vector 

Z(2)(3) ~Z ( 1 ) (2), Z ( 1 ) (3)-Zf 0 ) (2), Z ( 0 ) (3) — Z ( 2 ) (2) 

With this definition we can write 


(59) 

(60) 

(61a) 

(61b) 

(62) 

(63) 

(64) 


— 3H(Y n \W,A n = a N ) (65a) 

< H(c(a 3 - a 2 )S N + Z$ 2 |c(a 2 - )S N + Z% x ) - H(c(a 2 -a x )S N + Z^) 

-H(Y {0 p),Y {l p),Y {2 p)\c(a 2 - ai )S N + Z% l ,c(a 3 -a 2 )S N + Z% 2 ,W) (65b) 

< H(c(a 3 - a 2 )S N +Z$ 2 \c(a 2 -cn)S N + ) - H(c(a 2 - at)S N + Z^) - H(Z $), (65c) 

where Z 2 is a permutation of the noise terms 

[Z ( 0 ) (3), y ( 1 ) (3), F( 2 ) (3)]. ( 66 ) 


The expression in (165 cl) is composed of vectors of independent terms, but the distribution of Z 2 \ and Z 22 might not be 
identical, since we haven’t chosen a joint distribution between the noise terms. At this point in the proof we can sen the noises 







to be independent so that 


Z 2 I ,n ^32 ,i ~^(0,2), 

and iid for all i G [1.. .A/]. We can now evaluate the terms in (165cl) for this assignment as 

H(c(a 2 - a\)S N + Z%i) = ^ log27re (c 2 A 2 + 2), 

and 


- a 2 )S" + Z% 2 \c(a 2 - a x )S" +Z&) 


= AW 


fe- 02 )te-< , ,)(i- e2 ( c a * a : fl °; ) +1 s) 


+ Z32 - 


c 2 (< 2 3 -a 2 )(a 2 -a\) 


Z 21 , 


C 2 ((22-<2l) 2 + l 

where we have Z 32 and Z 21 are zero mean Gaussian with variance two. which can be further simplified as 


H(c(a 2 — < 22 )S + Z32|c($2 — < 2 l)tS + Z 21 ) 


= /f(cA 2 A + Z2|cAi,S + Zi) 


1 

2 

1 

2 

1 

2 


log 2 ^ 


log 2 ^ 


2a2 C 4 A 2 A 2 

c 2 A 2 + 2 — * 2 

c 2 A 2 + 2 

2c 2 (A 2 + A 2 ) + 4\ 


log 


c 2 A 2 + 2 
c 2 (A 2 + A 2 ) +2 
c 2 A 2 + 2 


+ log2jre 


The conditions in Jl7b for M = 3 become 

A 2 > a, 


A 2 > (ac 2 - 1)A 2 


A 2 + A 2 > ac 2 A 2 , 


(67) 


( 68 ) 


(69a) 

(69b) 


(70a) 

(70b) 

(70c) 

(70d) 

(70e) 


(71a) 

(71b) 


for some a > 0 so that we can write 

-3 H(Y N \A N = a N ) < 2 ^-^log27re(c 2 ) - ^loga^ -ylog2^e. (72) 

Note that when (I71bl) holds, then the entropy term H(Y\a N ,W) no longer depends on a N and thus we have that 

- Y, P(a N )H(Y N \A N = cA) < Uoglnec 2 - f logoA - £ a n, (73) 

a N e#f N ' 2 

for some £ a n which goes to zero as N goes to infinity. Equation (1731) follows, similarly to (l40l) . from the fact that the typical 

set (Pa) contains most of the probability and that the sequences in the typical set have a sample probability close to the 
Pa- 

• Case for general M 

The derivation for the case M = 3 can be extended to the general case by generalizing the bounding in (1571) . (IbTT) and d70l) 








to any M. Typicality, as in (1751) . can be invoked to obtain a bound on the term H(Y N \W,A N ). The bound in (1571) 


produce M — 1 sequence and the corresponding sequences so that 


(*) 






(74a) 

(74b) 


This expands on the bounding in (1571) 

• Obtain the term (a 2 — a\)S N + Z 21 as combination of the terms (2) — T( mod (fc+M-i,M))(l) f rom the entropy term in 
(I74b1) : this transformation is composed of a circular matrix and an identity matrix which can be shown to have unitary 
determinant. This term can be removed it from the term in using the definition of conditional entropy and bounded as 
—A/y21og(27rec 2 ) + l/21og(a) This generalizes the passage in ([fill) . 

• Successively remove the terms A iS N + Z i ( i _^ so that 

1 M ~ l _ _ _ _ 

-H(Y N \A n = a n ) < — £ NH{A i S + Z i \A l S + Z l ...A i - l S + Z i - l )-H(Z M ), (75) 


where is defined analogously to Z 21 in (l60l) . 

Each term H(AiS+ Zi\A\S+ Z\ ... A^iA + Z^i) in (1751) can be evaluated as 


H(A/S + Z/|AiS + Zi...A/_i5 + Z/_i) = -log 


( ^(£i-|A;) + 2 \ 


This term, under the condition in (1171) can be bounded as l/21og2^c 2 + 1/2 log a. 
This generalizes the bounding in (ITOl) . 


(76) 


With the above recursion we come to the outer bound 

^° UT = 2 l0g ^ + + Ma) c2 + ^)-^-l°g((l + Ma) c2 ) + 2 lo s( a )-+ 2 

This expression correspond to the expression in (1411) in the proof of Th. II V. 1 1 consequently in can be optimized over c as such 
said expression. This results in the outer bound 


R om = 


2 log (P + c 2 (l + nl) + l) - ^log(c 2 (l +Atj)) 

_ %ir lo 8( a ) + 2 

m log(l +P)~^f log(a) +1 


^ 2 (1 + Ma 2 )<^ 1 ( P + 1 ) 
^c 2 (l + M 2 )>^i( J P+l). 


• Capacity inner bound 


(78) 


For the inner bound, consider the case in which the transmitter pre-codes against one of the realizations of the state times 








the fading. Let such realization be a'S N so that we attain the rate 


R m > —— log(l +P) -— Y\ log 

~ 2M J 2M L* , h 


srf ,a^a' 


a 2 


Pc 2 {a — a') 
P + c 2 a 2 + 1 


1 . 


as 


in (l48l) . Using the definition of G in 


- 2 M l0g(1+jP) ~ L 


By combining the scheme in (l79l ) with the scheme that treats the fading-times-state as noise we attain the bound 

8P \ 


R m = max i log ( 1 


5e[0,i] 2 


l+c^l + ^ + SW 2 M 


L ln (i + - S p)-,, 


and the optimization over 5 yields 


R 1N =< 


|l°g(l+i +c 2 J+„2)) 

±log(P + c 2 (l+;tt]) + l) 
-W- lo s( c2 ( 1+ Mi))-G 
2 Flog(l+,P)-G 


^>ic 2 (l + i u 2 ) 


M 


C 2 (l+^)>^(P+l) 


(79) 


(80) 


(81) 


(82) 


• Gap between inner and outer bound 

The gap between inner and outer bound is obtained by comparing the expressions in (1781) and the expression in (l82l) . 
D. Proof ofTh.\IV4\ 

Similarly to the proof of Th. II V. 1 1 in App. |B] when deriving an outer bound to capacity. 


N(R-£) <7(7^17^) 

< £EU [log2ne(P + a 2 c 2 + 1)] - ^H(Y N \W,A N ) 

<^-log2ne(P+(l+n A )c 2 +l)-^- f P(a N )H(Y N \W,A N = a N )da N , 

2 2 Ja N es/ N 

where 


H(Y n \A n ,W) = f P(a N )H(X N + caS N + Z N \W)da N 
Ji N 

+ f P(a N )H(X N + caS N + Z N \W)da 

Jr n \i n 


Given the condition in (12Tb and since 

N 


A, 


log(2^) < H(X"+caSr+Z"\W) < -log(/> + c 2 + 1) + 1, 


and I N is a closed interval, we can apply the mean value theorem and conclude that 


(83a) 

(83b) 

(83c) 


(84) 


[ P(a N )H(X N + ca N S N + Z n \W,A n = a N )da N = P A (I N )H(X N + ca' N S N + Z N \W), 

Ji N 


(85) 










for some a' N G I N . Note that this holds even if the distribution P x n s n has some discrete points because of the convolution with 
the distribution of Z N . 

We can now write 


[ P(a N )H(X N + caS N + Z N \W)da N 
Ji N 

+ [ P(a N )H(X N + caS N + Z N \W)da N 

Jr n \i n 

= (. P A (I ) - (1 -P A (I))) N H(X N + ca fN S N + Z N \W) 

+ [ P A (a)(H(X N + caS N + Z N \W)+H(X N + ca ,N S N + Z N \W))da: 
Jr\i 

> (. P A (I ) - (1 -P A (/)))^(^ + ca ,N S N + Z N \X N ,S N ) 

+ 


>N 


J n N P(a N ) (H(X N + ca N S N + Z N ,X N + ca ,N S N + Z N \W))da N 

+ J n P{a N ) (H(c(a N - a' N )S N + Z N ,X N + ca ,N S N +Z N \W)) da A 
P A (I)-(l-P A (I)) 


log(2?re) 


+ / P(a Ar ) rilog27Te(c 2 (a Ar -a /iV ) 2 + 2) 

«/M\/ \ Z 


= N 


+H(Z N \c{a N - a ,N )S N + Z N ,S N ,X N )) d a N 

> Alog(2 ^e) + Pa( ^ 7) log2^(l + ^)c 2 


Jr\, 2 8 f 1 + Mi / 


da 


> N ^ log(2jre) + Pa( ^ 7) lo g 2jre(l + jit 2 ) )c 2 + G. 


This yields the same outer bound as (IdlT) but with an updated expression for G. As for Thm. IIV.1I we can optimize the 
expression in c and obtain the same outer bound. 












